Deep CNN-based object detection systems have achieved remarkable success onseveral large-scale object detection benchmarks. However, training suchdetectors requires a large number of labeled bounding boxes, which are moredifficult to obtain than image-level annotations. Previous work addresses thisissue by transforming image-level classifiers into object detectors. This isdone by modeling the differences between the two on categories with bothimage-level and bounding box annotations, and transferring this information toconvert classifiers to detectors for categories without bounding boxannotations. We improve this previous work by incorporating knowledge aboutobject similarities from visual and semantic domains during the transferprocess. The intuition behind our proposed method is that visually andsemantically similar categories should exhibit more common transferableproperties than dissimilar categories, e.g. a better detector would result bytransforming the differences between a dog classifier and a dog detector ontothe cat class, than would by transforming from the violin class. Experimentalresults on the challenging ILSVRC2013 detection dataset demonstrate that eachof our proposed object similarity based knowledge transfer methods outperformsthe baseline methods. We found strong evidence that visual similarity andsemantic relatedness are complementary for the task, and when combined notablyimprove detection, achieving state-of-the-art detection performance in asemi-supervised setting.
展开▼